OTUs selected by >30% of models generated by all OTU models

Using the following bacteria: Blautia (OTU 2), Blautia (OTU 4), Enterococcus (OTU 5), Enterobacteriaceae (OTU 15), Clostridium_XlVb (OTU 30), Prevotellaceae (OTU 81), Coprobacillus (OTU 101), Alloprevotella (OTU 118), Holdemania (OTU 199), Clostridium_XVIII (OTU 200), Robinsoniella (OTU 250)

CFU and Rel abundance plots

How does removing the low samples (Day 8 and CFUs around 0) affect prediction?

Removing these data points improves predictability for day 6 and 8 but not the earlier days. If I increase the cutoff to remove all CFU below 5, there is a slight increase in day 3 cfu classification at the cost of decrease day 1 cfu classification.

Is there any abnormalities in the OTUs of any these days?

Does taking the median CFU improve predicatbility?



Re-runnning the program and only focusing on the OTUs predicting day 9/10 to see if there is consistence in the predictive features gives the following selection (including all samples):

Otu00004 Otu00005 Otu00015 Otu00019 Otu00030 Otu00199 Otu00200 Otu00250
14 17 13 15 13 16 17 19

Boruta Confirmed the following OTUs as important for predicting day 9/10 cfu:
Otu00004 Otu00005 Otu00015 Otu00030 Otu00081 Otu00199 Otu00200 Otu00250 Otu00297
21 22 21 21 21 22 22 22 20

Selecting OTUs through collecting the features from the most predictive community/cfu models (R^2 >= 0.6 and MSE <= 0.8), then converting all % Increase in MSE to relative values and taking the median value of of each OTU, then selecting OTUs that fall above the median value results in the following OTUs:
“Otu00001” “Otu00002” “Otu00004” “Otu00005” “Otu00012” “Otu00014” “Otu00015” “Otu00016” “Otu00019” “Otu00028” “Otu00030” “Otu00048” “Otu00081” “Otu00101” “Otu00118” “Otu00199” “Otu00200” “Otu00250” “Otu00297”